Modeling with Structured Priors for Text - Driven Science by Michael J . Paul
نویسندگان
چکیده
Many scientific disciplines are being revolutionized by the explosion of public data on the web and social media, particularly in health and social sciences. For instance, by analyzing social media messages, we can instantly measure public opinion, understand population behaviors, and monitor events such as disease outbreaks and natural disasters. Taking advantage of these data sources requires tools that can make sense of massive amounts of unstructured and unlabeled text. Topic models, statistical models that posit low-dimensional representations of data, can uncover interesting latent structure in large text datasets and are popular tools for automatically identifying prominent themes in text. For example, prominent themes of discussion in social media might include politics and health. To be useful in scientific analyses, topic models must learn interpretable patterns that accurately correspond to real-world concepts of interest. This thesis will introduce topic models that can encode additional structures such as factorizations, hierarchies, and correlations of topics, and can incorporate supervision and domain knowledge. For example, topics about elections and Congressional legislation are related to each other (as part of a
منابع مشابه
SPRITE: Generalizing Topic Models with Structured Priors
We introduce SPRITE, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing SPRITE to be tailored to particular settings. We demonstrate this flexibility by constructing a SPRITE-based model to jointly infer topic hi...
متن کاملFactorial LDA: Sparse Multi-Dimensional Text Models
Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured...
متن کاملDrug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models
Multi-dimensional latent text models, such as factorial LDA (f-LDA), capture multiple factors of corpora, creating structured output for researchers to better understand the contents of a corpus. We consider such models for clinical research of new recreational drugs and trends, an important application for mining current information for healthcare workers. We use a “three-dimensional” f-LDA va...
متن کاملExperimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions
Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of a multi-dimensional latent text model – factorial LDA – that captures orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsuperv...
متن کاملSceneSuggest: Context-driven 3D Scene Design
We present SCENESUGGEST: an interactive 3D scene design system providing context-driven suggestions for 3D model retrieval and placement. Using a point-and-click metaphor we specify regions in a scene in which to automatically place and orient relevant 3D models. Candidate models are ranked using a set of static support, position, and orientation priors learned from 3D scenes. We show that our ...
متن کامل